1980: Analysis of Categorical Data in Weighted Cluster Sample Surveys
نویسندگان
چکیده
A weighted cluster sample survey design is frequently used in large demographic sample surveys. In the National Health Interview Survey conducted by the National Center for Health Statistics, households are often selected in clusters of four. In this survey, sociodemographic and health characteristics of all members of sample households are recorded. Such characteristics for each person interviewed are multiplied by a known weight that is approximately the inverse of the probability of being included in the sample on the basis of the post-stratified geographic and demographic domain of each individual This type of weighting is necessary to estimate certain characteristics of the target population at reasonable cost in large sample survey situations. Cohen (1976) ~iscussed the distribution of the chi-squared statistic from contingency tables in cluster sampling when clusters consist of two members. Altham (1976) generalized Cohen's results for clusters of M members. In the present research, these results are further extended to the weighted cluster sample survey. A new chi-squared statistic is used to analyze data from cluster sampling and weighted cluster sampling, and these two results are compared. This statistic is useful in the analysis of complex survey data for investigating the effect of weighting in ••cluster sample survey situations. Illustrative data from the 1975 National Health Interview Survey are analyzed by these new methods.
منابع مشابه
Improved K-Modes for Categorical Clustering Using Weighted Dissimilarity Measure
K-Modes is an extension of K-Means clustering algorithm, developed to cluster the categorical data, where the mean is replaced by the mode. The similarity measure proposed by Huang is the simple matching or mismatching measure. Weight of attribute values contribute much in clustering; thus in this paper we propose a new weighted dissimilarity measure for K-Modes, based on the ratio of frequency...
متن کاملWeighted delta factor cluster ensemble algorithm for categorical data clustering in data mining
Though many cluster ensemble approaches came forward as a potential and dominant method for enhancing the robustness, stability and the quality of individual clustering systems, it is intensely observed that this approach in most cases generate a final data partition with deficient information. The primary ensemble information matrix generated in the traditional cluster ensemble approaches resu...
متن کاملWheat and barley seed system in Syria: How diverse are wheat and barley varieties and landraces from farmer’s fields?
"> The present study described the diversity of wheat and barley varieties andlandraces available in farmer’s fields in Syria using different indicators. Analysisof spatial and temporal diversity and coefficient of parentage along withmeasurements of agronomic and morphological traits were employed to explain thediversity of wheat and barley varieties or landraces grown by farmers in Syria.Farm...
متن کاملIncremental entropy-based clustering on categorical data streams with concept drift
Clustering on categorical data streams is a relatively new field that has not received as much attention as static data and numerical data streams. One of the main difficulties in categorical data analysis is lacking in an appropriate way to define the similarity or dissimilarity measure on data. In this paper, we propose three dissimilarity measures: a point-cluster dissimilarity measure (base...
متن کاملClustering From Categorical Data Sequences
The three-parameter cluster model is a combinatorial stochastic process that generates categorical response sequences by randomly perturbing a fixed clustering parameter. This clear relationship between the observed data and the underlying clustering is particularly attractive in cluster analysis, in which supervised learning is a common goal and missing data is a familiar issue. The model is w...
متن کامل